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Abstract 

We present an improvement on Thurley's recent randomized approximation scheme for ^fc- 
SAT where the task is to count the number of satisfying truth assignments of a Boolean function 
$ given as an n-variable fc-CNF. We introduce a novel way to identify independent substruc- 
tures of $ and can therefore reduce the size of the search space considerably. Our randomized 
algorithm works for any k. For #3-SAT, it runs in time 0{e'^ ■ 1.51426"), for #4-SAT, it runs 
in time 0{e~^ ■ 1.60816"), with error bound e. 

Keywords: Algorithms; analysis of algorithms; randomized algorithms; ^/c-SAT; satisfiability. 

1 Introduction 

Background. The satisfiability problem (SAT) is one of the classical and central problems in 
algorithm theory. Its prominent role in Computer Science has even been compared [15] to the one 
that Drosophila (the fruit fiy) has in Genetics. Given a Boolean formula $ in conjunctive normal 
form (CNF) on n variables with m clauses, it has to be determined whether there is a satisfying 
assignment for <I> (and in this case, to determine one) or not. If every clause of $ has length at most 
k, $ is called a fc-CNF and the problem is dubbed /c-SAT. It is well known (for a comprehensive 
overview, see [3]) that /c-SAT is NP-complete for any k > 3, and that it can be solved in time 
linear in the input length for = 2 [T] . So it is generally assumed that there is no polynomial time 
algorithm solving /c-SAT for k > 3. In particular, 3-SAT has attracted much attention because of 
its "borderline" status. 

There is a rich history of developing both deterministic and randomized algorithms with running 
time 0(2") solving fe-SAT. The currently fastest deterministic algorithm for 3-SAT runs in timqj 
O*(1.3303") [IT], the fastest randomized algorithm has a running time of 0*(log((5"^) • 1.30704") [3]. 
In the randomized setting, the use of d means the following: If $ is not satisfiable, the algorithm 
returns the correct answer. If $ is satisfiable, it returns with probability 1—6 a satisfying assignment. 
Table [1] presents all best running times currently known to solve fc-SAT. 

For many combinatorial problems including A;-SAT, it is often not only important to determine 
one solution (if it exists), but also to determine the number of all different solutions. A famous 



^In this context, the notion 0*{.) is commonly used to suppress factors that are of size 2°'-"'. 
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Table 1: Previous and new results for A;-SAT and #A;-SAT where the input /c-CNF has n variables 
and m clauses. The times are given in 0*(.) notation. is the base-2 logarithm of the base of the 
running time in column "A;-SAT rand". For definition of /i^, see Sec. 12. 2t V'A; is the largest root of 
l-2z'' + z^+^ = 0; 2i/{2-&) > at. 
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example from statistical physics is the computation of the number of configurations in monomer- 
dimer systems (for an overview, see [8j). The complexity class that corresponds to these counting 
problems is #P, and #SAT, the problem to determine the number of satisfying assignments, is 
well-known to be #P-complete. More exactly, let ^k-SAT denote the problem to determine 
i. e., for input $ being a fc-CNF, the number of satisfying assignments. Then, it is known [17] that 
#A;-SAT is #P-complete for k>2. 

Topic of this work. In the area of combinatorial counting problems, there is also the problem of 
approximating the wanted number. In particular, there is the task to develop so-called randomized 
approximation schemes that receive as input $ and an arbitrarily small bound e on the maximum 
admissible error and that compute with some fixed probability greater 1/2 an e-estimate of 
(for exact definitions, see Sec. [2]). In a recent paper, Thurley |16] presents such a randomized 
approximation scheme for ^k-SAT that has, for k = 3, running time 0*(e~^ • 1.5366"), and for 
/c = 4, 0*(e~^ • 1.6155"). A detailed description of Thurley's algorithm is presented in Sec. [2l 
Tabled] also presents all best running times currently known to solve i^k-SAT. 

A different approach by Impagliazzo et al. [^ leads to a randomized Las Vegas algorithm for 
^fc-SAT that always returns the exact solution and has expected running time O*(2(i-i/(30fc))n)^ 
Note that for any k, Thurley's algorithm is faster than this method. 

New Results. We present a randomized approximation scheme for #A;-SAT that takes the input 
fe-CNF much more into account than Thurley's algorithm. In particular, we present a method that 
determines a large set of maximal independent subformulas of <I>. I.e., the subformulas have no 
variables in common and can therefore be treated independently. As they are maximal, they convert 
the remaining clauses into clauses of length k — 1. Hence, the search space is substantially reduced. 
Our scheme, which works for any #fc-SAT instance, has for #3-SAT running time 0(e~^ -1.51426"), 
and for #4-SAT, it works in time 0{e-^ ■ 1.60816"). Note that our scheme is for all k faster than 
Thurley's scheme. 

Organization of Paper. In the next section, we define the necessary terms, and we give a 
comprehensive description of Thurley's randomized approximation scheme. In Sec. [3l we present a 
first improvement that exploits single clauses. Generalizing this approach and building upon each 
other, we present further improvements based on large sets of maximal independent clauses (Sec.|4]), 
and on large sets of maximal independent subformulas (Sec. [5]). 
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Figure 1: (a) Elimination tree and (b) a 3-cut for $ = (xi V X2) A (x2 V X3), due to the elimination 
order (x2,xi,X3). The sum of the leaves is = 4. Satisfiable nodes are boxed. Note that 
can already be computed from the nodes on level 1. 



2 Elimination Trees, Monte Carlo Counting, and Thurley's Algo- 
rithm 

Let $ be a A;-CNF, i.e., a Boolean function given in conjunctive normal form with n different 
variables xi, . . . ,Xn on m different clauses such that every clause has length at most k. For an 
arbitrary Boolean formula (j), let Yar{<j)) denote the variables that occur in <j). Let b : Var((/)) {0, 1} 
be a partial assignment of truth values to the variables in 4>. By (pb we denote the formula we obtain 
from (/) by fixing in (j) the variables according to b. 

There is a nice interpretation of ^^fc-SAT in terms of complete binary trees of height n (i.e., 
having levels 0, . . . , n) that is sometimes used in the context of counting. An elimination tree for 
a A;-CNF $ can be defined as follows. Fix an elimination order (yi, . . . , y„) of the variables. Every 
node (j) of the tree corresponds to a Boolean formula. The root (on level 0) of the tree is <1>. Every 
node (j) on level i, < i < n, has two children: One child is (pyi=o, the other one is 4>yi=i- So a path 
from the root to a leaf corresponds to an assignment to the variables, and the formula at a leaf is 
either or 1. is the number of leaves marked 1. The mark 1 is additionally broadcast to all 
internal nodes on a path from a 1-leaf to the root. L e., it is visible on every node <j) whether cp is 
satisfiable or not. For a small example, see Fig. [T]^a). 

Let ^ be a positive integer. An i-cut of an elimination tree is an arbitrary connected subtree 
that contains the root, only 1-nodes, and has i leaves (w. r. t. the subtree). For an example, see 
Fig. m^b). An i-cut contains at most n ■ £ nodes. From determining an ^-cut, immediately > £ 
follows. Note that the elimination order significantly infiuences the moment when in the elimination 
tree the number of satisfying assignments can be determined. 

Let e > be an arbitrarily small number, e is the upper bound on the admissible relative error. 
A number L is called an e-approximation of if (1 — e) • < L < (1 + e) ■ A randomized 
approximation scheme (RAS) A is a randomized algorithm that computes on inputs ^ and error 
e a number A{^,£) such that Pr[j4($,e) is an e-approximation] > |. Note that by the median of 
means method, the probability can be boosted to any number 1 — S, for 6 being arbitrarily close to 
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0. Algorithm A which usuahy outputs a mean is repeated R = G(log5 ^) times, and the median 
of the R values computed is returned. 



2.1 Monte Carlo Counting 

A very simple, general Monte Carlo approach for counting works as follows. Let S be the set whose 
cardinality has to be computed, and let 3 5 be a superset of S such that can be computed 
exactly and, preferably, fast. Sample independently and uniformly at random T elements from U, 
and let L be the number of elements from S among these T samples. Return the number y ■ |^/| 
as an approximation of Standard probability theory (e.g., see [13l p. 311]) gives that a sample 
size of T = ■ |Z^|/|5|) suffices to ensure the RAS property. 

For #A:-SAT, U = {0, 1}" may be chosen. If somehow a lower bound i on is known, the 
Monte Carlo approach immediately gives a RAS with running 

timeTMc = 0{e-^ ■2'' /i- (n + km)) = 
0*(e~^ • 2^/i). If £ (or is small, this is unfortunately an unsatisfactory upper bound. We refer 
to this algorithm as MC($,^, e). If the reliability is amplified to 1 — (5 by the median of means 
method, the running time is 0*{log{6^^) ■ • 2"/£), and we write MC{^,£,e,S). 

Note that for the similar problem ^^DNF where a Boolean formula in disjunctive normal form 
is given, a set U can be devised |Sj with \U\ < m ■ yielding a RAS with polynomial running 
time 0(e~^ • m • (n + km)) = 0*(e~^). 



2.2 Thur ley's RAS 

The running time of MC is decreasing in £. Therefore, Thur ley presented an algorithm that, for 
input $ and determines whether there are at least £ satisfying assignments. This time the running 
time is increasing in £. In a last step, £ is chosen such that it balances the running times of MC 
and Thurley's approach. In the following, we explain this method in detail because it is also the 
starting point for our improvements. 

Let (3k denote the smallest known constant such that there is a randomized algorithm solving 
/c-SAT in time 0*(2'''^-"). Hence, /Sg ^ 0.3864, fi^ ^ 0.5548, and, for A; > 5, /3fc = 1 - Hk/{k - 1), 
where 

J 

is the constant involved in the PPSZ algorithm [14j . 

Thurley's RAS works as follows: It has as input a /c-CNF $ and a bound £. It computes an 
^-cut of the elimination tree (if it exists). Whether a node is a 1-node can be checked fast with 
the known randomized /c-SAT decision algorithms mentioned above where = 2®^"''), for some 
constant c > 2. We call this the ^-cut phase. 

If it cannot find an £-cut, it reports the number of subtree leaves it actually has found in the 
cut as estimation on (in fact, this number is even the correct value, w. h. p.). If, on the other 
hand, £ subtree leaves have been determined, this means that a lower bound of £ on has been 
determined. In this case, the Monte Carlo algorithm MC($,£, e) is executed to approximate 
with error e. 

Since in the worst case 2* nodes (formulas) are on level i and i variables have already been fixed. 
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Figure 2: Possible pruned elimination tree obtained by elimination w. r.t. clause (xi V X2) at the 
root and, therefore, chosen variables xi,X2, and then on level 1, w. r. t. clause (X3 VX4) on the left 
node, and clause 2:3 on the right node. For both nodes, the chosen variables are X3,X4. Note that 
this tree has just 6 leaves (the upper bound is 9) rather than 16 leaves as in the binary case from 
Sec. [2j Also note that in general, on the same level different variables at different nodes may be 
chosen. 



the running time of the i-cut phase is 

/\og{n-l) \ / \og{n-e.) \ 

Hence, the overall running time is 3-t most ^cut + Tmc = 0*(2^* '^-£i-^fc+e-2.2n/£) which becomes 
0*(e-2 • (2V(2-A))") when £ = 2"-(i-/3fc)/(2-A). For = 3, this is 0(e"2 . 1.5366"), and, for k = A, 
it is 0(e-2 • 1.6155"). 

The whole algorithm in both phases does not exploit the actual structure of which opens the 
possibility for improvements. 



3 Pruning the Tree: Taking Single Clauses into Account 

The first possibility to slightly improve Thurley's RAS considers the clauses that occur in the 
formulas (j) that are associated with the nodes of the elimination tree. The following approach is 
also the basis for the further improvements described in the subsequent sections. If C is a clause 
of <j) (in the following, we will write C € (p) and consists of k (k < k) literals, there is exactly 
one truth assignment to the n variables of C such that C is not satisfied. So any assignment that 
incorporates this specific assignment does not contribute to 

With this observation, it is possible to modify the elimination tree as follows. The root is the 
input /c-CNF For a node's formula (p, choose one of its clauses, C, having k literals, then choose 
k variables including all variables from C, and plug into the formula only those 2^ — 2^~^ (< 2^) 
truth assignments that satisfy C. For every formula obtained in this way, a new node is generated. 
That means that we increase the degree of the elimination tree from 2 to up to 2^ — 1 and reduce 
its height to \n/k~\. This tree is substantially smaller than the binary elimination tree described in 
Sec. El The total number of nodes is at most 0((2'= - I)"/''). For fc = 3, this is 0(1.91294"). For a 
2-CNF example, see Fig. O 

An ^-cut of such an elimination tree is now used in the cut phase of Thurley's RAS. Now, the 
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running time of the cut phase is 

(log(2fc_i)("-^) 
i=0 

— Q*[2^k-n . ^l-/3fe-A:/log(2''-l)-j ^ 

If we use this pruned ehmination tree to determine an ^-cut, and then (if necessary) using the 
Monte Carlo approach for computing the estimate with error at most e, we choose I for balancing 
the two phases as follows: 

log I: = • n. 

l^k- iog(2fc-l) 

We can state the result of this section with 
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Theorem 1. Our pruned elimination tree algorithm is a RAS for ^k-SAT. Its running time is 
0(e-2 • (2Pfc)"). 

So for k = 3, the running time is 0(e"^ • 1.5298"), and for A; = 4, it is 0(e^^ • 1.6122") which 
is already slightly better than the running time of Thur ley's RAS (see Table [T|). 

From this first improvement approach we already learn that by inspecting $ we can reduce the 
size of the elimination tree. But we can see more. If we can simultaneously find several clauses 
that do not share variables, we even can easily exclude from the elimination tree all assignments 
that do not satisfy these clauses. 

As the tree computed in this section may be unbalanced and different variable choices on the 
same level are possible, to sample assignments uniformly with the help of the tree may be not 
possible. That means that this pruned tree approach presumably cannot be used to also speed up 
the Monte Carlo approach, except for the transition from level to level 1, where the mentioned 
problem does not occur. 

In the next sections, we construct larger structures that reduce the size of the elimination tree 
further and that allow for faster uniform sampling. 



4 Taking Sets of Independent Clauses into Account 

For a A;-CNF <I>, two clauses of ^ are called independent if they have no variable in common. A 
subformula -0 C <I> is called independent if the clauses in ■0 are pairwise independent, ip is called 
maximal if every clause in <^ shares at least one variable with a clause in ip. Let [ip] denote the 
number of clauses in ip. 

4.1 Speeding up the Monte Carlo counting 

In the running time Tmc of the Monte Carlo algorithm from Sec. 12. H the cardinality of U plays a 
very important role. Suppose that an independent subformula tp has been somehow (see Sec. 14.21 
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below) computed. Exploiting ip, one can use a significantly smaller set U^, namely = {b € 
{0, 1}" I b satisfies ijj} which is obviously a superset of = {b & {0, 1}" | b satisfies <!>}. As the 
clauses in ijj are independent, the size ofU^ can be bounded as follows: \U^\ < {2^ — l)''^' .2"~'^'l'^l = 
2" . (1 - 2-^)\'^\. Sampling assignments from lA^ is simply done by choosing u. a. r. one of the at 
most 2^ — 1 satisfying assignments to the variables of each C ^ if), and assigning or 1 to each of 
the remaining variables. Provided I < = the running time is Tmc = 0*{-^ ■ (1 — 2^'^)l'^'l). 
In terms of elimination trees, ^ is used during the sampling for going from the root to the nodes 
on level 1. The remaining sampling is performed as in the binary elimination tree case described 
in Sec. EH 

4.2 Controlling the decision between elimination and recursion 

Until now, we assumed ip already available. Now we present a method for either obtaining a 
sufficiently large number of independent clauses or, if this is not possible, to transform the /c-CNF 
to a not too large number of {k — l)-CNFs. This method has been introduced by Hofmeister et 
al. [U [6] and works on inputs ^ and integer m as follows. In particular, it also controls whether 
the algorithm from Sec. [3] is applied or recursive calls on nodes of the elimination tree: 

Starting with ^/^ = 0, greedily increase V as much as possible. Note that V' is now a maximal 
independent subformula. 

If {ipl > m, return ip. 

Otherwise, use to generate the at most [2^ — 1)™ different (A; — l)-CNFs on level 1 of the 
elimination tree, recursively solve ^{k — 1)-SAT with these formulas as input, return the sum of 
the estimations, and report the overall counting task as finished. 

We will refer to this method as Red($, m, e, 5). When Red(<1>, m, e, (5) has been executed and 
returned ip with ip > m, store tp and if the pruned elimination tree algorithm from Sec. [3] reports 
the existence of at least i satisfying assignments, use tp in the Monte Carlo algorithm from Sec. 14.11 
Since Red makes random decisions when solving the #(A; — 1)-SAT instances, we specify in the call 
of Red also 6 such that the probability that Red returns either an e-approximation or a set ip of 
at least rh independent clauses is at least 1 — 5. 

Together with the modified algorithm to determine an i-cut from the previous section, we ob- 
tain an improved RAS. Since the algorithm for solving ^k-SAT for A: > 4 includes the algorithm 
itself for solving the occurring ^(k — 1)-SAT instances, the runtime has no closed form, but can be 
calculated recursively. We terminate the recursion for k = 2 and use Wahlstrom's (even determin- 
istic) algorithm [TH], solving 7^2-SAT in time 0(1.2377"). Although our method works for any k, 
we state our result for the cases k € {3, 4} only. 

Theorem 2. The algorithm described above is a RAS solving #A;-SAT. For k = 3, its running time 
is 0(e^^ • I.SISI"), and for k = A, its running time is 0{e~'^ ■ {log{6~^) + n) • 1.6105"). 

Proof, (a) Let k = 3. For arbitrary m, Red either finds a set ip oi rh independent clauses or it has 
to solve at most 7"^ #2-SAT instances, each over the remaining n — 3m variables. This is done 
(even deterministically) with Wahlstrom's algorithm [18] in time 0(1.2377""^™'). So the overall 
running time of Red is Tred = 0((7 • 1.2377-^)™ • 1.2377") = 0(3.6920^ • 1.2377"). 

As mentioned above, the running time of MC in case of iV'j > m is at most Tmc = 0*(2"/^- (|)™) 
for i < if^, and the running time for finding an i-cut (or being sure that none exists) is due to 
Eq. (HD, Tcut = 0*(2^3 " •£i-'^3-3/iog7)^ rpj^g break-even point for Tred, ?Cut and Tmc and therefore 
the worst case occurs for choosing rh = 0.1563 n and i = 1.2903". 
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(b) The proof for the case A: > 4 is analogous. The only difference is that Red is not deterministic 
anymore. In order to get an e-approximation with probability at least 1 — 5 in the end, we require 
from the nested #(/c — 1)-SAT algorithm an e-approximation with probability at least 1 — 5/7™ < 
1 — 5/2", so the necessary number of repetitions for each recursive call of the RAS for jj^{k — 1)-SAT 
has to be increased by n (additively) to ensure the desired success probability. 

For the time bound for #4-SAT, we now use the bound from (a) for ^^S-SAT and obtain the 
claimed result by the same straight-forward calculations as in the first case. In the worst case, the 
parameters are m = 0.0587 n and I = 1.2372". □ 

5 Taking Large Independent Structures into Account 

One can easily see that both the modified version of the Monte Carlo algorithm and the modified 
method for calculating an ^-cut do not require the elements of ip to be clauses. Both can be 
generalized for %[) being a set of pairwise independent, arbitrary subformulas with constant size. 
E. g., m. if) = ai /\ (T2 with ui = {xi V X2 V X3) A (xi V 2:3 V X4) and CT2 = (xs V Xe) A (xs V X7), cji and 
(72 are independent. In order to recognize the subformulas, we write (in general) V = {(^i-, ■ ■ ■ ■, ^m}- 
We call a single subformula a a struct. If every clause shares at least one variable with a clause 
in tp, we call ip maximal. Due to their constant size, it is possible to compute the number of their 
satisfying assignments. We first show how STRUCTs can be used to improve the algorithms MC 
and Cut, then we show how to construct them, and finally we present the overall RAS. 

5.1 Monte Carlo counting and cut phase if many independent subformulas are 
known 

The generalized version of MC is presented below as Algorithm [H the generalized version of the 
l-cxii phase as Cut (Algorithm [2]). For a subformula a, let denote the number of different 
variables in a, and L^j the number of satisfying assignments of a. Note that Cut uses a global 
counter L (only set to in the first call) and globally aborts as soon as L reaches I. Otherwise, it 
returns L after finishing. 

Our results from Sec. [Ulead to the following lemmas. 

Lemma 1. Let ip = {iti, . . . ,0"^} he a set of constant- sized, pairwise independent subformulas of 
<I>. If > £, MC(^, ijj, I, £, 5) returns with probability at least 1 — 5 an e-approximation for 
The running time is 

rMc = o-(iog(r').^.n|:^) . 

Proof. We sample from the set of all assignments satisfying ip. The size of this set is 

The result follows directly from Sec. 12. 1[ □ 

Lemma [1] generalizes the result of Sec. 14. H where we used independent clauses, to the new 
situation, enabling the use of more complex, but still pairwise independent subformulas. The same 
generalization can be applied to the cut phase from Sec. [3l 
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Algorithm 1: MC{^,^p,e, e, 6) 



input : fc-SAT instance set = {cJi, . . . ,(7^1} of independent subformulas, parameters i, 

e, 6 

output : w. p. 1 — 5: e-approximation on 
f^-2".n^; T:=logi6-').^^; L := 0; 

repeat T times 

forall the a G ip do 

1^ assign u. a. r. one of tlie assignments satisfying a to its variables; 

forall the x G Vars($) not already fixed do 
1^ assign u. a. r. or 1 to x; 

if $ is satisfied then 
\_L:=L + 1; 

return L/T -U; 



An important difference is that in Sec. O we could assume that for each node (p there always 
is a clause to generate new nodes since otherwise calculating #0 would be trivial. Now it may 
happen that ^|J becomes empty. In this case, we fall back to using single clauses, but due to the 
maximality of ^, those clauses have length at most k — 1. The following lemma states the running 
time of Cut in both cases. 

Lemma 2. If < i, then, with probability at least 1 — S, CuT('^>,^/;,£, 5J returns ^<I> exactly. 
If i < Ylfj^^La, 1st ip' = {(Ti, . . . , fJm'} C V denote a subset of ip, where m' is chosen such that 

Uf=i'L^.<<^<UT=iL... Ifi>U.e^L^, setm' := m + log(2.-i_i)(^ • R.e^, Then the 

running time of CuT is 

Tc.. = 0* (logr^)-2^-.n^), 

^ '2fc-i_iy"-"' 



otherwise. 



Proof. Assume < i, so the algorithm searches for a pruned elimination tree with i leaves. 
Since 2" is a rigorous bound on the number of formulas that have to be tested on satisfiability, the 
probability that the satisfiability check always gives the right answer is at least 1 — 2" • 6/2" = 1 — 5. 

Now we focus on the run-time analysis. Basically, for each node of the tree, at most 2^^ — 1 
different A;-SAT instances must be solved. So we have to analyze for each node 0, how many 
variables of (p are fixed. On the i-th. level of the tree, for i < rh, the variables of the first i 
subformulas of ip are fixed. If i > m, all the variables of ip and additionally the variables oi i — m 



^Roughly speaking, rh' is the minimum number of elements of 1/) respectively additional clauses that have to be 
chosen by Cut until the total of £ leaves of the elimination tree can be achieved. 
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Algorithm 2: Cut($, V", ^, 5) // Note: L is a global variable 
input : fc-SAT instance $, set of subformulas parameters £, 5 
output : w. p. 1 — 5: either exactly, or the message that > £ 

if $ is unsatisfiable (w. p. at least 1 — 5/2"^) then 
I return L; 
else 

L:=L + l- 
if L>i then 
|_ global abort; 

if / then 

I choose the first a & ijj; 
else 
|_ choose cr G 

forall the Assignment b to Vars((T) satisfying a do 
L CUT($6,V'\M, 6); 

return L; 



of the remaining clauses are fixed. Note that due to the maximality of ijj, those clauses have size 
at most k — I. Let Ni denote the number of nodes in level i. For i <rh, the nodes on level i have 
X]}=i '^o-j fixed variables, so the time Tj required for processing only the nodes on level i is 

Ti = O* ((n + log{6-^)) ■ Ni ■ 2'5'=-'^-S}=i ""^i^ = O* [ log(ri) • Ni ■ 2'^'=-" • f{ 2" 

V i=i 

For i > ih, the number of fixed variables is YljLi + {k — 1) ■ {i — m). In this case, we obtain 

(m 
log(r^) • Ni ■ 2'^*-" • 2-('=-i)-(^-'^) . Yl 2~''''i 
3=1 

Since a node on level i — 1 has out-degree at most for i < m and 2^~^ — 1 otherwise, on level i 

there are at most Ni < Y[)=i La-j nodes if i < m and Ni < (2*^"^ — ly HJli ^crj nodes if i < m. 
Of course, Ni < i is also a bound we have for every Ni. Let h be the maximum length of a path 
from the root to a leaf. Then the overall running time Tcut of Cut is (where we omit the 0*(.) 
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and the factor log((5 ^)) 

h rh h 

i=0 1=0 i=rh+l 



. m I m 

i=i J i=i 



I i=i i=i ^ 




m /rjfc— 1 -1 \ m m 



2""j- 

i=i ' j=i 



One can easily see that in both sums, the first term inside the min is decreasing in i. Since for any 
a we are going to use, we have La-/2^'''""^ > 1 (and since otherwise the runtime of the algorithm 
would be even better), the second term inside each min is increasing in i. So, up to a constant 
factor, the sum is equal to the summand where both terms inside the minimum have the same 
value, which is the case for i = rh'. This finishes the proof. □ 

Note that, in the case of a node on level i > rh, Cut would not have to solve a fe-SAT instance 
anymore but only a (A; — 1)-SAT instance. However, such considerations would lead to only a small 
improvement of the running time, so we sacrificed the benefit from this observation in order to 
simplify our analysis. 

5.2 Finding large sets of independent subformulas 

It remains to provide some method that collects the set ^ of subformulas. We call these subformulas 
STRUCTs and, in order to compute them, we generalize algorithm Red from Sec. 14.21 as follows. We 
start with the rh independent clauses as initial set of structs. Now we search iteratively for 
further clauses in <I> that extend in a controlled way the structs we already have such that they 
remain independent. Since the structs must have constant size, there are structs that we do 
not want to extend anymore. We call a variable x occurring in struct a closed, if we do not want 
to add further clauses to a that contain x. a is called closed if it has only closed variables. For 
example, assume we already found the two structs o"i = {(xi VX2), (^2 VX3)} and cj2 = {(x^yx^)} 
where X2 and X5 are closed variables. The clause (x2 V xq) G ^ would not be considered because it 
contains the closed variable X2- But the clause (X3 V X4) G $ can be used to extend and connect 
the two STRUCTS to the single struct as = {{xi V X2), (x2 V X3), (xs V X4), (a;4 V x^)} which is, in 
our example, a closed struct. 

Of course, there is no guarantee that the extension phase runs until every struct is closed. In 
the above example, without the clause (X3 V X4) only the two non-closed structs can be obtained. 
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But if we ensure that each struct has at least one closed variable in every clause, the set tp is 
maximal, meaning that every clause in $ contains at least one variable that is a closed variable of 
some STRUCT a & ip. So, in order to reduce $ to a — 1)-SAT instance, one only has to fix the 
closed variables. If Red finds a large number of STRUCTs, then MC (Algorithm [T]) runs faster. If 
there are only a few, then fixing their closed variables leads to only a few ^{k — 1)-SAT instances to 
be solved. For a given struct a, let fa denote the number of closed variables and Wa the number 
of assignments to the closed variables that do not already violate a clause of a, i.e., = and 
Wa = La for every closed struct. 

The algorithm for grouping the STRUCTs is given in Algorithm [3l a/, refers to a constant 
determined in the proof of Theorem [3] below. One can think of it as the value such that for the #A;- 
SAT problem an e-approximation can be obtained in time 0*(e~^ • log((5~^) • a^) with probability 
at least 1 — 6. The algorithm accesses a library, containing for every occurring struct a the closed 
variables. This library assures that the size of the structs is constant. 



Algorithm 3: Red($, e, (5) 
input : /c-SAT instance <I>, parameters i, e, 5 

output : either a set ip of structs or, w. p. 1 — 6: e-approximation L of 

while there is a clause C G $ that contains no closed variable of a STRUCT in tp do 
X '■= all STRUCTs of tp that have a variable with C in common; 
a := new STRUCT created from C and the STRUCTs from x] 
_ ^ := (^-X)UM; 

/* a^: const, determined in Thm. [3] */ 

if "fc-i • Wae^ ■ «fc-T < "fc then 
L := 0; 

forall the assignments b to the variables in ip that satisfy ip do 
\_ L:= L + IndepSubform_RAS(^>6, e, ^2"); 

return L; 
else 
1^ return ip; 



Lemma 3. Assuming ^{k — 1)-SAT can be approximated in time 0*{a^_i), Red(<I>, £, e, (5) (Al- 
gorithm \BP) returns either w. p. at least 1 — 6 an e-approximation of or a set ip of pairwise 
independent structs such that a^_i ■ Tlaetp'^f^/'^k'-i — '^k- runs in time 

Proof. If Oi^_i ■ Ylaei' jg ^ ^ after the while-loop, which obviously runs in polynomial time, is 

finished, then the algorithm returns the set ip with the claimed property. Otherwise it enumerates 
all assignments for the closed variables of the structs of ip that do not already cause ip to be 
evaluated to 0. Since the structs in ip are pairwise independent, there are exactly Ha-eV' such 
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assignments. For each of these assignments, Red starts an algorithm that w. p. at least 1 — 5/2" 
returns an e-approximation of the resulting #(/c — 1)-SAT instances. The probability that each of 
the rio-ev ^^^^ actually returns an e-approximation is therefore at least ^—S/2^-Yl^^^ Wa > 1—6. 
Since X^^-g^ fa variables are fixed, each of the #(A; — 1)-SAT instances has n — X^o-gv •^'^ variables 
and to approximate it within the given parameters takes time 

O* • (n + \og{5-')) ■ aZf"^''^^) = O* I e'^ ■ log(<5-i) ■ a^i 




This finishes the proof. □ 



5.3 All things come together: The new randomized approximation scheme 

Combining all results, we are now able to state our main algorithm that solves #A;-SAT. Indep- 
SuBFORM_RAS (Algorithm H]) shows how to combine the Algorithms [U [2] and El 



Algorithm 4: IndepSubform_RAS 
input : fc-CNF e, 6 

output : Approximated number L of satisfying assignments of <I> 

if Red(<I>, i, e, 5) returns L then 
I return L 
else 

ip := set of STRUCTs returned by Red; 
if Cut($, -0, £, 5) returns value < i then 
I return 
else 

|_ return MC($, ^/^, i, e, S); 



Theorem 3. IndepSubform_RAS (Algorithm^ is a RAS running in time 0{e ^ • log{6 ^) ■ 
1.51426") for #3-SAT and in time ©(e'^ . log(J-i) • 1.60816") for #A-SAT. 

Proof. The results follow from calculating the break-even points of the time bounds in Lemma (H 
Lemma [2] and Lemma El For our version of algorithm Red, we used the STRUCTs and made the 
decisions about which variable to declare closed as described in Table [2l and El resp. We declared 
every other struct that is not listed as closed (by setting all its variables closed) and set in 
Algorithm El as = 1.51426 for /c = 3 and 04 = 1.60816 for k = A. 

For A; = 3, in the worst case, i = 1.28794" and there are 0.05252 n closed structs of the form 
{(xi Vx2 V^a), (xi Vx/^Vx^), xqV xj), (3:4 VxgVxg)}. For A; = 4, in the worst case, i = 1.23823" 
and there are 0.01785 n closed structs of the form {{xi V X2 V X3 V X4), (xi V X5 V xg V X7), (x2 V 
xg V xg V xio), (x5 V xii V X12 V X13)}. This leads to the claimed bounds. □ 

Note that, for any k, our method runs in time 0*{a^) with depending on and a^-i. For 
ah k, Ok < 2V(2-/3fc) =: (Thurley's running time), even if we define structs consisting of just 
a single clause as closed. E.g., 05 « 1.6694 < 1.6712. 
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Table 2: Non-closed structs for = 3 



Table 3: Non-closed structs for A; = 4 



Type a 




closed Vars. 




Type a 


La 


closed Vars. 


{(Xl VX2 Vxs)} 


7 






{{x\ V X2 V X3 V a;4)} 


15 


{X4} 


{(xi y X2 V.X3), 

(x\ \l X^\J X5)} 


25 


{^1} 




{(xi V X2 V X3 V X4), 
[x\ V X5 V X6 V X7)} 


113 




{(Xl \J X2 VX3), 

{x\ V X2 V 0:4)} 


13 


{^1} 




{(xi V X2 V X3 V X4), 
(x\ V X5 V X6 V X7), 


851 


{X\,X2\ 


{(xi Va;2 Vxs), 








(X2 VX8 VXg VXio)} 






(xi V X4 V X5), 

{X2 V X6 V X7)} 


89 
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